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The frequentist behavior of nonparametric Bayes estimates, more 
specificaUy, rates of contraction of the posterior distributions to shrink- 
ing L'^-norm neighborhoods, 1 < r < oo, of the unknown parame- 
ter, are studied. A theorem for nonparametric density estimation 
is proved under general approximation-theoretic assumptions on the 
pH ' prior. The result is applied to a variety of common examples, includ- 

^0 , ing Gaussian process, wavelet series, normal mixture and histogram 

priors. The rates of contraction are minimax-optimal for 1 < r < 2, 
but deteriorate as r increases beyond 2. In the case of Gaussian non- 
parametric regression a Gaussian prior is devised for which the pos- 
terior contracts at the optimal rate in all //""-norms, 1 < r < cxd. 



1. Introduction. In finite-dimensional statistical models the Bernstein- 
von Mises theorem provides a frequentist justification of the use of Bayesian 
Tj- ' methods. In the case of infinite-dimensional models, consistency properties 

^^ . in weak metrics hold under relatively mild conditions; see Schwartz [28]. 

Consistency in stronger metrics was considered by Barron, Schervish and 
Wasserman [1] and by Ghosal, Ghosh and Ramamoorthi [9], and, shortly 



cn 

P^ . after, Ghosal, Ghosh and van der Vaart [10] and Shen and Wasserman [30] 

developed techniques that allow us to prove frequentist rates of contraction 
of the posterior to the true infinite-dimensional parameter in the Hellinger 
metric, if the prior is suitably chosen according to the structure of the non- 
parametric problem at hand. This led to further progress recently; we refer 

,^_' to [11, 12, 32, 34] and the references therein. 

This literature has been successful in generalizing the scope of these tech- 
niques to a variety of different statistical models, and has naturally focussed 
on consistency and rates of contraction results in the Hellinger distance. For 



Received March 2011; revised September 2011. 

AMS 2000 subject classifications. Primary 62G20; secondary 62G07, 62G08. 
Key words and phrases. Rate of contraction, posterior, nonparametric hypothesis test- 
ing. 



This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Statistics, 
2011, Vol. 39, No. 6, 2883-2911. This reprint differs from the original in 
pagination and typographic detail. 

i 



2 E. GINE AND R. NICKL 

instance, if po is the unknown density to be estimated, and if n(-|Xi, . . . ,X„) 
is the posterior based on a prior 11 and a sample Xi,...,X„ with joint 
law Pq , results of the kind 

(1) U{p:h{p,po)>en\Xi,...,Xn)^0 in Pg" probability 

were established, where h?{f,g) = fiVJ— v^)^ i^ ^^^ Hellinger metric and 
where e„ — ?• 0. Such posterior contraction results are known to imply the 
same frequentist consistency rate £„, also in the metric h, for the associated 
formal Bayes estimators. 

In this article we investigate the question of how to generalize results 
of this kind to more general loss-functions than the Hellinger metric, with 
a particular focus on L''-norms, 1 < r < oo. Such results are of interest for 
a variety of reasons, for example, the construction of simultaneous confidence 
bands, or for plug- in procedures that require control of nonparametric re- 
mainder terms (e.g., in the proof of the Bernstein-von Mises theorem in 
semiparametric models in Castillo [6] ) . They are also of interest with a view 
on a more unified understanding of nonparametric Bayes procedures that 
complements the existing L''-type results for standard frequentist methods. 

The main challenge in extending the theory to the L'"-case, except for 
specific conjugate situations discussed below, rests in generalizing the Le 
Cam-Birge testing theory for the Hellinger metric to more general situations. 
A main ingredient of the proof of a result of the kind (1) is that, in testing 
problems of the form 

(2) Ho:p = po against Ha-P€ {p--h{p,po) >en}, 

universal tests with concentration bounds on type-H errors of the type e~ '^" 
exist, under assumptions on the size, or entropy, of the "alternative" space 
defining Hp^. This fact is rooted in the subtle connection between nonpara- 
metric testing problems and the Hellinger metric as highlighted in the work 
of Le Cam [21] and Birge [2]. A main contribution of this article is the de- 
velopment of a new approach to testing problems of the kind (2) based on 
concentration properties of linear centered kernel-type density estimators, 
derived from empirical process techniques. While this approach can only be 
used if one has sufficient control of the approximation properties of the sup- 
port of the prior, it can be generalized to arbitrary L'^-metrics, including the 
supremum norm ||/||oo = sup^|/(x)|. The concentration properties of these 
tests depend on the geometry of the L'"-norm and deteriorate as r — t- oo, 
which is, in a sense, dual to the fact that the minimax testing rate in the 
sense of Ingster [20] approaches the minimax rate of estimation as r — )• oo. 
While our main results can be viewed as "abstract" in that they replace 
the entropy conditions in [10] for sieve sets Vn by general approximation- 
theoretic conditions (see Theorems 2 and 3 below), our findings become 
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most transparent by considering specific examples, selected in an attempt to 
reflect the spectrum of situations that can arise in Bayesian nonparametrics: 
In Section 2 we study the "ideal" situation of a simple uniform wavelet prior 
on a Holder ball, the "supersmooth" situation of mixtures of normals, the 
case of random histograms based on a Dirichlet process where no uniform 
bound on the L°°-norm of the support of the prior is available, as well as 
Gaussian process priors of the kind studied in [32]. The general conclusion is 
that if /o is a-smooth, then the rate of contraction obtained in the L''-norm 
for a posterior based on an adequately chosen prior of smoothness a is, up 
to logn factors, and with f = max(2,r), 

-, ^ (Q-l/2+l/f)/(2a+l) 

(3) (- 

So as soon as r < 2 our proof retrieves the minimax optimal rate, but for 
r > 2 the rate deteriorates by a genuine power of n. As a approaches infinity 
this effect becomes more lenient and vanishes in the limit. 

We currently have no proof of the fact that our general theorem gives the 
right rate for Bayesian posteriors if r > 2 — similar problems are known with 
nonparametric maximum likelihood estimators in L''-metrics (cf. the proof 
of Proposition 6 in [27]). While we do not settle the issue of optimality of 
our rates for r > 2 in this article, we also prove in Theorem 1 below that 
in nonparametric Gaussian regression the minimax rate of contraction can 
be obtained by certain diagonal Gaussian wavelet priors, in all L'"-norms 
simultaneously. We believe that this result is closely tied to the fact that 
the posterior is then itself Gaussian, and conjecture that our rates cannot 
be substantially improved in the nonconjugate situation. 

2. Main results. Let P be a class of probability densities on [0, 1] or R, 
and let Xi , . . . , X„ be a random sample drawn from some unknown prob- 
ability density po with joint law the first n coordinate projections of the 
infinite product probability measure P^. Suppose one is given a prior prob- 
ability distribution IT defined on some u-algebra B oi V. The posterior is 
the random probability measure 

We wish to analyze contraction properties of the posterior distribution under 
certain regularity conditions on 11 and po) and these regularity properties 
can be conveniently characterized by wavelet theory. 

2.1. Function spaces and wavelets. For T = M or T = [0, 1], / : T i— t- M, we 
shall write ||/||oo = ^^]?xeT\f{^)\-> ^^^ norm on the space C{T) of bounded 
continuous real-valued functions defined on T. We shall use wavelet theory 
throughout; see [19, 26]. Let (j),ip be the scaling function and wavelet of 



4 E. GINE AND R. NICKL 

a multiresolution analysis of the space L^ (T) of square integrable real- valued 
functions on T. We shall say that the wavelet basis is S-regular if 0, tp are 5- 
times continuously differentiable on T. For instance we can take Daubechies 
wavelets on T = M of sufficiently large order N (see [26]) and define the 
translated scaling functions and wavelets 

(4) cl)k = (t){--k), V'^fc = 2^/2^(2^(-)-A:), £ G NU {0}, A; G Z, 

which form an orthonormal basis of Lp' (M) . 

For T = [0, 1] we consider the orthonormal wavelet bases of -^^([0, 1]) con- 
structed in Theorem 4.4 of Cohen, Daubechies and Vial [8]. Each such 
basis is built from a Daubechies scaling function (/> and its corresponding 
wavelet V; of order N ^ starting at a fixed resolution level Jq such that 
2-^0 > 2N (see Theorem 4.4 in [8]): the ilJik,4>k that are supported in the 
interior of [0, 1] are all kept, and suitable boundary corrected wavelets are 
added, so that the {(pk^'^Pek : < /c < 2^, ^ € N, ^ > Jq} still form an orthonor- 
mal basis for L'^{[0, 1]). While formula (4) now only applies to the "interior" 
wavelets, one can still write (j)jk = 2^''^(f>k{2^ ■) for every k,j > Jq; cf. page 73 
in [8] and also after Condition 1 below. 

Definition 1. Let T = [0, 1] or T = M, and let I <p,q<oo, < s < S, 
s G M, 5 S N. Let (j),7p be bounded, compactly supported 5-regular scal- 
ing function and wavelet, respectively, and denote by ak{f) = JrpCpkf and 
l^ikif) = Jxi^ikf the wavelet coefficients of / G U'{T). The Besov spa- 
ce Bpy(T) is defined as the set of functions {/ G U'{T) : ||/||s,p,g < oo} where 

/ oo \ 1/9 

ll/IU,p,, := l|a(.)(/)llp+ E(2'^'"''^'"'^'^ll%)(/)llp)' 

\e=o J 

with the obvious modification in case q = oo. 

Remark 1. We note the following standard embeddings/identifications 
we shall use (cf. [19, 26]): for C^(T) the Holder (-Zygmund in case s in- 
teger) spaces on T, we have B'^^{T) = C^ (T) . Moreover B^^i^) =H'{T) 
where H^{T) are the standard L^-Sobolev spaces. We also have the "Sobolev- 
type" imbeddings S^g(r) C BI~'^'''^^'\t) for t > r, 1 < g < oo. Finally, if 
r= [0,1], then C'iT) C B^^{T) for every r < oo, where C7"(r) = {/:T^ 
M: ||/||„,oo < oo}, with ||/||„,oo := ELoH/^'^IU, « e N. 

2.2. Uniform wavelet series. Let us consider first the case where an a pri- 
ori upper bound on the Holder norm ||po||o,oo,oo is available, so that the prior 
can be chosen to have bounded support in C"([0, 1]). An example is obtained, 
for example, by uniformly distributing wavelet coefficients on a Holder ball. 
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Let {iphjipik} be a A^-regular CDV- wavelet basis for L^([0,1]), let n^. be 
i.i.d. U{—B, B) random variables, and define, for a < N, the random wavelet 
series 

oo 
(5) Ua{x) = Y,^OkMx) + Yl J^2-^("+l/2)u,fcV'^fc(x), 

k e=jQ k 

which has trajectories in C"([0, 1]) C -^^ ([0, 1]), 1 < r < oo, almost surely (in 
view of Definition 1 and Remark 1). Since moreover ||?7q||oo < C{B,a,ip), 
and since the exponential map has bounded derivatives on bounded subsets 
of M, the same applies to the random density 

whose induced law on C([0, 1]) we denote by 11". Our general results below 
imply the following proposition, which, since po is bounded away from zero, 
implies the same contraction rate in Hellinger distance h. Note moreover that 
the result for 2 < r < oo could be obtained from interpolation properties of 
-L''-spaces. 

Proposition 1. Let Xi,. . .,Xn be i.i.d. on [0,1] with density po satis- 
fying ||logpo||a,oo ^ B. Let 1 <r < oo, f = max(2, r),r* = min(r, 2), and sup- 
pose a > 1 — 1/r*. Then there exist finite positive constants M,r] = r]{a,r) 
such that, as n — )• oo, 

n"{p eV:\\p- pollr > Mn-("-l/2+l/f)/(2a+l)^i^g^)„|^^^ _ _ _ ^ ^^| 

(6) 

-^^0 0. 

2.3. Dirichlet mixtures. Consider first, as in [9, 12, 13], a normal mixture 
prior n, defined as follows: for ip the standard normal density, set: 

{-) PF,a = J^(T~^vii--y)/(^)dF{y), 

(-) F ~ Da the Dirichlet-process with base measure a = a(M)a, a{M) < oo 
and a a probability measure, 

(-) a ^ G, where G is a probability distribution with compact support in 

(0,oo). 

Proposition 2. Let Xi, . . . ,Xn be i.i.d. on M with density pp^^ao where 
(To > and where Fq is supported in [— A;o, /cq], ^o > 0. Suppose that G has 
a positive continuous density in a neighborhood of ctq, and that the base 
measure a has compact support and a continuous density on an interval 
containing [— /cq, k^]. Then there exist finite positive constants M,rj such that 



(7) nUp€V:\\p-po\\oo>M^^°^''^^ 



n 



Xi,.. .,Xn) —^^° as n —^ OO. 
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Consider next a random histogram based on a Dirichlet process, similar 
to the priors studied in [29] : for j G N let Dir j be a Dirichlet-distribution on 
the 2-' -dimensional unit simplex, with all parameters equal to one. Consider 
the dyadic random histogram with resolution level j 

^ ( /k -I k^^ 

2^ajfc2nH^— ,- Ux), {a^fcj-Dirj, xG[0,1], 

fc=i "^ ^ ^ ^ 

and denote its law on the space of probability densities by Ilj. Note that 

this prior is not concentrated uniformly (in j) on bounded densities (despite 

the densities in the support being uniformly bounded for fixed j). 

Proposition 3. Let Xi,...,Xn be i.i.d. on [0,1] with density po £ 
C"([0, 1]),0 < a < 1, satisfying pQ>0 on [0, 1]. Let jn be such that 2-'" ~ (n/ 
\ogn)^'^'^°'~^^' , let 1 <r < oo, f = max(2,r) and let either a > 1/2 or r = 1. 
Then for some M, i] = r]{a,r), as n —t- oo 

n,„{p eV:\\p- poWr > Mn-("-i/2+iA^)/(2«+i)(iogn)''|Xi, . . . , X„} 
(8) 

-^"> 0. 

2.4. Gaussian process priors. We now study a variety of Gaussian pro- 
cess priors that were considered in the nonparametric Bayes literature re- 
cently; see [32, 34] for references. To reduce technicalities we shall restrict 
ourselves to integrated Brownian motions, but see also the remark below. 

Definition 2. Let B{t) = Bi/2{t), t ^ [0,1], be a (sample-continuous 
version of) standard Brownian motion. For a > 1, a £ {n — 1/2: n £ N}, 
setting {a} = a — [a], [a] being the integer part of a, Ba is defined as the 
[a] -fold integral 

Ba{t)= / •••/ / B{s)dsdti---dt[^^^^ 



JO JO JO 

t 



a 



^J^{t-s)^^'^-'B{s)ds, te[o,i], 

where for [a] = 1 the multiple integral is understood to be only L B{s) ds. 

Following [23, 32], and as before Proposition 1, we would like to define 
our prior on densities as the probability law of the random process 

(Q) - 

but we must make two corrections: first, since Ba (0) = a.s., k < [a], would 
impose unwanted conditions on the value at zero of the density, we should 
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release Ba at zero, that is, take Ba := Ylk=o^kt^ /^^- + ^ai where Z^ are 
i.i.d. A^(0, 1) variables independent of Ba] see [32]. In order to deal with 
bounded densities, we introduce a second modification to (9), and define 
our prior (on the Borel sets of C([0, 1])) as 



(10) n = £ 



oBq 



I R II < r 



where c is a fixed arbitrary positive constant. This prior works as follows: if 
A C C([0, 1]) is a measurable set of continuous densities on [0, 1], then 

n(A) = Prje^Y I e^" G A, p^lU < c|/Pr{||5«||oo < c}, 

and clearly the denominator is strictly positive for all c > 0; see Proposition 7 
below. 

Proposition 4. Let I <r <oo, f = max(r, 2), a e {n - 1/2, n G N} and 
assume (a) po ^^"([0,1]), and (b) po ^s hounded and hounded away from 
zero, say, 2||logpo||oo < c< oo. Let 11 he the prior defined hy (10) where a 
is as in (a) and c is as in (b). Then, if Xi are i.i.d. with common law Pq 
of density po, there exists M < oo s.t. 

U{peV: \\p - PoWr > Mn-("-l/2+l/r-)/{2a+l)^jQg^){l/2)l{,=^j |;^^^ . . . , X„} 

^0 

in Pq -prohahility as n — t- oo. 

As remarked before Proposition 1, a contraction result in the Hellinger 
distance follows as well, and the case 2 < r < oo could be obtained from 
interpolation. 

The result in Proposition 4 extrapolates to fractional multiple integrals 
of Brownian motion (Riemann-Liouville processes) of any real valued in- 
dex a > 1/2, and it also extends to the related fractional Brownian motion 
processes (see, e.g., [32] for definitions), but, for conciseness and clarity of 
exposition, we refrain from carrying out these extensions. 

2.5. Sharp rates in the Gaussian conjugate situation. We currently have 
no proof that the rates obtained in the previous subsections are optimal for 
these priors as soon as r > 2. While we conjecture that Bayesian posteriors 
may suffer from suboptimal contraction rates in density estimation problems 
in L''-loss, r > 2, we finally show here that in the much simpler conjugate 
situation of nonparametric regression with Gaussian errors, sharp rates in 
all L"^ norms can be obtained at least for certain diagonal wavelet priors. The 
proof of this result follows from a direct analysis of the posterior distribution, 
available in closed form due to conjugacy. 
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Given a noise level 1/ \/n,n G N, we observe 
(11) dY^''\t) = f{t)dt + ^dB{t), tG[0,l], 



jn 

for / = /o £ -^^([0)1])) where B is Brownian motion on [0,1]. This model 
is well known to be asymptotically equivalent to nonparanietric regression 
with fixed, equally-spaced design and Gaussian errors. 

Consider priors on L^([0, 1]) defined on a S-regular CDV- wavelet basis as 

/ N oo 2^^-! \ 

(12) ^ = ^ ( X] 3^^^ + X] X] -/^tgikipik I 

\fc=0 l=Jo k=0 / 

in L2([0,1]), with the g's i.i.d. A^(0,1) and with /i^ = ^-i2-^(2a+i) v^> Jq. 
Such a prior is designed for a-smooth /q. As is easily seen, the series in (12) 
converges uniformly almost surely. 

Theorem 1. LetO <a < S, and letU be the Gaussian prior on L'^ {[0,1]) 
defined by (12) based on a CDV wavelet basis o/L^([0, 1]) of smoothness at 
least S. Let /o eC°([0,l]), let e„ = (n/logn)~"/(^"+^) and suppose we ob- 
serve dY^ (t) = fo{t) dt-\-dB{t)/^/n. Then there exists C < oo and Mq < oo 
depending only on the wavelet basis, a and ||/o||a,oo,oo such that, for every 
Mq<M< oo, and for all 1 <r <oo,n£'N, 

(13) S^(.)n(/ : 11/ - /oil. > Afe„|yJ")) < n-^'(*•^-^^")^ 

This rate of convergence is sharp (in case r < oo up to the log n-term) in 
view of the usual minimax lower bounds and since the contraction rate im- 
plies the same rate of convergence for the formal Bayes estimator Sn(/|^ ) 
to /o (using Anderson's lemma and the fact that the posterior is a random 
Gaussian measure on L^([0, 1]), as inspection of the proof shows). One may 
even apply the usual thresholding techniques to the posterior mean to obtain 
a Bayesian rate adaptive estimator of /o by proceeding as in [17, 25]. 

3. General contraction theorems for density estimates in L''-loss, 1 < 

r < CX3. We shall, in our main results, use properties of various approxi- 
mation schemes in function spaces, based on integrating a localized kernel- 
type function Kj{x,y) against functions p, Kj{p) = J Kj{-,y)p{y)dy. Let, 
in slight abuse of notation, for T C R, L^l^fi^) = L^iT,B,iJLw),w > be the 
space of //^-integrable functions, d^w{t) = (1 + |t|)"'dt, normed by ||/||^„ = 
J"^ |/(i)|(l + |t|)"' dt. Recall the notion of p-variation of a function (e.g., as 
before Lemma 1 in [17]). 

Condition 1. Let T = M or T = [0, 1]. The sequence of operators Kj{x, 
y) = 2^ K{2^x,2^y);x,y G T,j > 0, is called an admissible approximating se- 
quence if it satisfies one of the following conditions: 
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(a) (convolution kernel case): K{x,y) = K{x — y), where K G L°°{T) is 
of bounded p-variation for some finite p > 1, right (or left) continuous, and 
satisfies H-R'H/xi^, < oo for some w>2. 

(b) (multiresolution projection case): K{x,y) = X]fc0(x — k)(j){y — k), the 
sum extending over any subset of Z, where (j) £ L^ Ci L°° has bounded p- 
variation for some finite p>l and satisfies, in addition, sup^gjg X]fc|0fe(2;)| < 
oo as well as \K{x,y)\ < $(|x — y\) for every x,y €T and some function 
^ € L°°(IR) for which ||<&||;i„ < oo for some w >2. 

(c) (multiresolution case, T= [0,1]): K{x,y) = '}2k4>k{x)(f)k{y) is the pro- 
jection kernel of a Cohen-Daubechies-Vial (CDV) wavelet basis. 

Condition (a) is a standard assumption on kernels, condition (b) is sat- 
isfied for most wavelet basis on M, such as Daubechies, Meyer or spline 
wavelets, by using standard wavelet theory (e.g., [19]). For part (c) we 
note the following: as in the case of the whole line, an orthonormal basis 
of Vj = {4>jk = 2^/'^4>k{2^-)} is obtained from 2J--^o-fold dilates of the basic 
linear span Vjq, for every j > Jq (page 73 in [8]). In this case, Vj has dimen- 
sion 2^ , and a basis consists of: (i) A^ left edge functions (p^j^ix) = 2^''^(j)^{2^x), 
k = 0,. . . ,N — 1, where (p^ is a modification of (/), which is still bounded 
and of bounded support; (ii) N right edge functions 4>]f.{x) = 2^/'^(j)\{2^ x), 

k = 0,. . . ,N — 1, (pj. also modifications of (p bounded and of bounded sup- 
port, and then the 2^ — N "interior" usual translations of dilations of (j), 4>jk, 
k = N,. . . ,2^ — N—1. The projection kernel Kj{x,y) = K^{x,y) + KUx,y) + 

Kj{x,y) corresponds to the projection onto the three orthogonal compo- 
nents of Vj (the linear spans, respectively, of the left edge functions 0^;,, 

the right edge functions 0^., and the interior functions (pjk)- The first two 
spaces have dimension A^ and the third, 2-' — 2N. By Lemma 8.6 in [19], 
there exist bounded, compactly supported nonnegative functions <1> such 
that K{x,y) < $(|x — y\), for all x,y. We call this function a majorizing 
kernel of the interior part of K. 

Let Xi be i.i.d. with law Pq and density pQ. 

Theorem 2. Let T = [0, 1] orT = R, let V = V{T) he a set of probability 
densities on T, and let II^ be priors defined on some a-algehra of V for 
which the maps p i— >• p{x) are measurable for all x £T. Let 1 < r < oo and 
let En ^0 as n^ oo be a sequence of positive numbers such that \/n£n — )• oo 
as n — )• oo. Let 

(14) 5„ = e„(n4)l/2-l/(2r)^^^ 

for some sequence 7^ satisfying 7n > 1 V?i. Let Jn be any sequence satis- 
fying 2"^" < cne"^ for some fixed < c < 00, and let Kj be an admissible 
approximator sequence. Let Vn be a sequence of subsets of 

(15) {p(^V:\\KjM-p\\r<C{K)5nM\^,^<D], 
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where C{K) is a constant that depends only on the operator kernel K , D is 
a fixed constant, and where w > {2 — r)/r if r <2, w = if r >2. 
Assume there exists C > such that, for every n large enough: 

(1) Iin{V\Vn)< e-('^+'')"'^" and 

(2) n„{p e P : -Po log ^ < el, Po(log ^? < 4} > e"^"^- . 

Let po £ U{T) he s.t. \\Kj^{pq) -pollr = 0(*«) and s.t. ||polUu, < oo if 
T = R, 1 < r < 2. If 5n ^0 as n^ oo, then there exists M <oo such that 

(16) Iln{P ^'P '-Wp — Po\\r^ M5n\Xi, . . . ,Xn} ^- flS n — )• OO 

in P^ -probability. 

Note that the moment condition in (15) is void if r > 2 or if T = [0, 1]. 
If r = 1 the rate can be taken to be 6n = £n or, more generally, (5„ = ^n^n- 
For r = oo one only has at best 6n = v^^n ; which is always slower than e„ 
(since ^/nen — )• oo). In case 1 < r < oo the rate interpolates between these 
two rates without, however, requiring po € L°°. 

In the case where po is bounded, and if it is known that the posterior 
concentrates on a fixed sup-norm ball with probability approaching one, 
we can refine the rates in the above theorem for 1 < r < oo, and retrieve 
the (in applications of the theorem often optimal) rate e„ for 1 < r < 2. The 
following theorem can be applied with 7n = 1 Vn, in which case conditions (a) 
and (b) require the rate e„ to be fast enough (which in applications typically 
entails that a minimal degree of smoothness of po has to be assumed). 

Theorem 3. Let T,V,Iln be as in Theorem 2. Let 1 < r < oo, and let 
£„ — 7- as n^ CO be a sequence of positive numbers such that \/n£n —t- oo as 
n — )• oo. Let f = max(r, 2), and set 

(17) 5n = en{nelf'^-^"^n 
for some sequence 7n > 1- Assume either: 

(a) that 1 < r < 2 and that En = 0(7„(ne^)^/''"^) or 

(b) that 2 < r < oo and that e^ = 0(7n/\/^)- 

Let Jn , Vn be defined as in Theorem 2, assume that conditions (1) and (2) 
in that theorem are satisfied, and that, in addition, 

(3) there exists < B < oo such that 

nn(p G r : iipIIoo > ^l^i, . . . ,x„) ^ o 

as n —)• oo in Pq -probability. 

LetpoeL'^{T) be s.t. \\Kj^{po)-po\\r = 0{6n) and such that \\po\\n^ <oo 
for some w > {2 — r)/r if T = M,1 <r <2. If 6n ^ as n^ oo, then there 
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exists M < oo s.t. 
(18) n„{pGP:||p 
in Pq -probability. 



-Po\ 



.>M5n\Xi,...,Xn}^0 



as n 



■ oo 



3.1. U -norm inequalities. A main step in the proof of Theorems 2 and 3 
[see (30) below] is the construction of nonparametric tests for L^-alternatives, 
1 < r < oo, that have sufficiently good exponential bounds on the type- two 
errors. For this we first derive sharp concentration inequalities for L''-norms 
of centered density estimators. It is convenient to observe that the degree 
of concentration of a kernel-type density estimator around its expectation 
in U depends on r, as can already be seen from comparing the known cases 
r = l,oo in [14, 16] for kernel estimators and [17] for wavelets. These re- 
sults are derived from Talagrand's inequality [31] for empirical processes: 
let Xi, . . . ,Xn be i.i.d. with law P on a measurable space {S,S), let J^ be 
a P-centered (i.e., J f dP = for all f ^ T) countable class of real- valued 
measurable functions on S, uniformly bounded by the constant [/, and set 
||i7||j- = supj-gjr|i/(/)| for any iJ:J^— t-M. Let a be any positive number 
such that a2>supjg_^^(/2(X)), and set F := na^ + 2f/^||^"^^ /(Xj)||^. 
Then, Bousquet's [5] version of Talagrand's inequality, with constants, is as 
follows (see Theorem 7.3 in [5]): for every x > 0,n S N, 



(19) Pr< 



E/(^.; 



> E 



T 



n 

E 



/(^.; 



+ \/2T^ + [/x/3 I <2e" 



T 



This applies to our situation as follows: let X\^ . . . ,Xn be i.i.d. with density 
Po on T with respect to Lebesgue measure A, dPo = podX, and let Pn{j) = 
~Y17=i^j('^-^i) be a kernel-type estimator with Kj as in Condition 1. Its 
expectation equals PQPn{3){x) = EKj{x,X) = Kj{pq){x), and we wish to 
derive sharp exponential bounds for the quantity |||?n(i) — ^jiPo)\\r for 1 < 
r < oo. In case r = oo this can be achieved by studying the empirical process 
indexed by 

)C = {K,{x,-)-Kj{po){x):xeT}, 

and in case r < oo we shall view Pnij) — P^PnU) as a sample average of 
the centered L''(T)-valued random variables Kj{-,Xi) — Kj{pq), and reduce 
the problem to an empirical process as follows: let s be conjugate to r, that 
is, 1 = 1/s + 1/r. By the Hahn-Banach theorem, the separability of L''{T) 
implies that there is a countable subset Bq of the unit ball B of L^{T) such 
that 



\H\ 



■ sup 



H{t)f{t)dt 
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for all H G U{T). We thus have \\pn{j) - P^Pn{j)\\r = \\Pn - Po\k, where 
Pn = X]r=i ^xjn is the empirical measure, and where 

IC = L^ j fit)Kj{t,x)dt- j fit)Kj{pomdt:fGBo 

To apply (19) with the countable class /C we need to find suitable bounds for 
the envelope U > sup^g;^|A;(2;)| and the weak variances a^ > sup^g^^; Ek^{X). 
We will also apply (19) in the case r = oo, and note that the correspond- 
ing empirical process suprema are over countable subsets Bq of T, by the 
continuity property of K in the convolution kernel case, and by finiteness 
of the p-variation of the scaling function in the wavelet case (Remark 2 
in [17]). 

3.1.1. Envelope and variance bounds for /C. We first consider Condi- 
tion 1(a), the convolution kernel case: let us write in abuse of notation 
Kj{-) = 2^K{2^-) and / = 6y,y £ Bq C T for r = cxo. (One naturally re- 
places L^ by the Banach space of finite signed measures if r = cxo in the 
arguments below.) The class /C then equals 

}C = {x^K,* fix) - E{Kj * f{X)) : / G Bq). 

The bound for the envelope is seen to be of size 2-''^^"^''"': by Holder's in- 
equality 

(20) \\K, * /lU < ||i^,||r||/||s < C(Jf,r)2^'(i-i/^) = [/, 

a bound that remains true when r = oo since \2^K[2^{x — y))\ < \\K\\oo2^ . 
To bound the variances, for densities poG L"^ , we have 

(21) E{K, * f){Xf < WpoWrWKj * fWl < C"{K, r)||po||r2^'(^-i/'-) ^ a' 

from Holder's inequality and since \\Kj * f\\2s, for / G L'* is bounded up to 
constants by 2-'''^'^"^'^^), by using Young's inequality ||/i* (7||j < ||/i||p||(7||g 
for l + l/t= l/p+l/q,l <p,q,t<oo. 

The last estimate can be refined if po is known to be bounded, where we 
recall that f = max(r, 2), to yield 

(22) E{K, * f){Xf < C{po)2^^'-'/'^ = a^ 

where C(-) is bounded on uniformly bounded sets of densities. To see this, 
consider first r > 2 and thus s < 2: then Young's inequality gives, as above, 

E{K, * f){Xf < WpoWooWK, * fWl < C||po||oo2^^(i-2/^) = a\ 

If 1 < r < 2, then po e L°° D L'^ C U/^''~'^\ so by Holder's inequality 

E{K, * f){Xf < \\K, * /||^||po||./(s-2) < C{po)\\K,\\l\\f\\l < Cipo,K). 
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For Condition 1(b), so in the multiresolution case for T = 'K, the arguments 
as in (a) and obvious modifications give the same bounds for [/, a in view of 
the estimate \J^Kj{x,y)f{ii)dy\ < $j * |/|(a;), which aUows us to compare 
wavelet projections to convolutions and proceed as above. 

For Condition 1(c), note that, by the comments fohowing the statement 
of Condition 1, the projection kernels have the form Kj = K^ + K^ + Kj 
where Kj{x,t) = 2^ K{2H,2^x) with K majorized by a convolution kernel. 
Therefore the envelope and variance bounds for the previous two cases apply 
as well to this "interior part" of the kernel. For the boundary part, 

N-l 

(23) K]{x,t) = Y, 2^1(2^ x)4{2H), i = 0, 1, j > Jq, 

fc=0 

with N finite and (p^. bounded and with bounded support, it is immediate 
to check, just using Holder's inequality, that for f £ Bq, 



2^<i>U2^x) / cPi{2H)fit)dt 



< 



^felloo 



||4||,2^-(i-V'^), 



and that 



2'^Ei^li2^X))' 



1 < r < oo, 



^2i(i-iA) 



\<t>U2H)\\f{t)\dtj < ||po||r||<^lll^J|. 

for po G L^, with the refinement UpollooH'/'fcllill'/'fcllr^-''-^"^^''^ if IIpoIIoo < oo- 
This shows that the bounds for [/, o"^ from (a), (b) apply to (c) as well. 

3.1.2. Application of Talagrand's inequality. To apply Talagrand's in- 
equality we need a bound on the moment of the supremum of the empiri- 
cal process involved, provided in the following lemma, known for the cases 
r = oo (see [14, 17, 25]) and, implicitly, 1 < ?' < 2 (see [15]). As the proof is 
standard but somewhat lengthy it is given in the supplementary file for this 
paper, [18]. 

Lemma 1. Assume Condition i(a), (b) or (c) and that pQ G L^(T). If 
1 < r < 2 in the cases (a) or (b), assume further that pQ G L^{fis) for some 
s > [2 — r)/r. Then, if 1 <r <oo, there exists Lj. such that, for all j > if 
r <2, and for all j such that 2^ <n for r > 2, we have 



(24) E\\n{Pn-Po)\\K = E 



Y,iK,{;Xi)-EKj{;X)) 



j=i 



If r = oo, for pq and <I> bounded, there exists a constant L^o such that for 
all j satisfying 2^j <n we have 



(25) ^||n(P„-Po)||^ = ^ 



YiK,i;X,)-EK,{;X)) 



i=l 



<L^\f¥] 



n. 
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We are now ready to apply (19): for V = na^ + 2U E\\pn{j) — Epn{j)\\r we 
have the bound 

Pr|n||p„(j) - ^o"Pn,(j)l|r > nEWUJ) " PoPnU)\\r + V2V^+^\ < 2e-\ 



This can be further simpHfied, using the standard inequahties \/a + b < 

^/a + ^/b, Vab < (a + b)/2, to 



MnWPniJ) - PoPniMr > lnE\\pniJ) " PoPn{j)\\r + Vllia^X + |C/x} 

<2e-^. 

Combining the moment estimate Lemma 1 with (20) and (21), we obtain, 
for 2^j{r) < n with j(oo) = j and j{r) = 1 for r < oo, 

Mn\\Pn{j) - P^PniMr 

(26) . ^ 

> C{Jvnj{r) + A/n2i(i-iA)||po||^2; + 2^'^'^-^/'''^x)} < 26"^ 



for some constant C, and in the case where ||po||oo < oo we have, analogously, 
from (22), 

Pr{n||p„(i)-Po"Pn(j)||r 
(27) 



> C{^2^nj{r) + -yn2J(i-2/f)||pg||^^ + 2^^^-^/''^)] < 26"^ 

If we take En-, Sn, 2^" ~ nef^ as in Theorems 2, 3, and if ||po||r is bounded 
by a fixed constant B, then the choice x = Lne^ gives for every L and 
M = M{L,K,B) large enough, after some simple computations using the 
conditions on en,Sn from the theorem, that 



nM6n > C{^2Jnjn{r)n + ^J\\po\\rn2i"(^~y'")Lne^ + 2^"^^'^^''^ Lnel) 

and, likewise, if ||po||oo is bounded by a fixed constant, the corresponding 
choice of 5n,M also satisfies 

nM6n > C(^2i"j„(r)n + ^Jc{po)n2Ml~Vr)ln^2^ + 2^"^^~^^''^Lnel). 

Moreover for ||po||r ^ C > we have 



n\\po\\r > C{^2^"jn{r)n + ^J\\po\\rn2M-^-^/r)Lnel + 2^"^'^"'^/''^ Lnel) 

from some index no onwards that depends only on C,C- 

Using these inequalities in (26), (27), we conclude that in both cases, for 
every < L < oo we can find a large enough M{L,K,B) such that 

(28) PT{n\\pn{jn) - PoPnUn)\\r > Mn6n} < 26"^"^" 

and, likewise, for n large enough, 

(29) PrMPniJn) - PoPn{jn)\\r > n||po||r/3} < 26" 



-Lne?. 
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3.2. Proof of Theorems 2 and 3. Using the small ball estimate from con- 
dition (2), it suffices to construct tests (indicator functions) </>„ = (/>.„(Xi, . . . , 
XniPo) such that 

PQ4>n —^0 as n — )• oo and 
(30) 

sup P"(l-(/>„)<2e-(^+^)""" 

peVn- |b-P0||r>A/5„ 

for n large enough; see the proof of Theorem 2.1 in [10]. 

Consider first Theorem 2. Let p„ be a kernel- type density estimator based 
on an i.i.d. sample Xi, . . . ,Xn of common law Pq, n E N, at resolution J„. 
For Mq, a constant to be chosen below, set Tn = \\pn — Po\\r and (pn = I{Tn > 
-Mo^n). Note that (j)n is the (indicator of the) rejection region of a natural 
test of the hypothesis Hq ■.p = pQ. Then we have 

P^cPn = PoiWPn - PoWr > A/o<5„} 

< Po"{||Pn - PoPnWr > M^6n " \\P^Pn-pAr}- 

Since \\Kj^[pq) — pollr < c'5n for some c' > by assumption, we have for all n 
large enough, PQct^n < -Po^{||Pn — P^PuWr > {Mq — c')(5„}. Then using inequal- 
ity (28), we have for some constant Li for some constant Li, choosing Mq 
large enough, that, as n — )• oo, 

(31) P^^n < 2e-^i""" ^ 0. 

Let now p be a density in Vn such that ||p — Pollr ^ M5n (the alternatives). 
Set dP{x) =p{x)dx. We have, from the triangle inequality, 

P"(l - <t)n) = P"{||Pn -Pollr < Mo<5„} 

(32) < P^{\\pn - P"p„||r > I|p-P0||r " Mo<5„ - \\P^Pn-p\\r] 

< P"{||p„ - P'^PnWr > \\P - Pollr " (Mq + C{K))6n} 

since by assumption on Vn-, sup^g^^ \\P'^Pn — p\\r < C{K)5n, uniformly in 
peVn- 

To complete the estimation of the last probability, we consider first r > 1 . 
For those p £ Vn satisfying \\p\\r > 2||po||r we have \\p — Po\\r > l|p||r/2 > 
llpollr, and, using inequality (29) for po =p, we deduce, that for all L > 0, 
there exists no G N such that for all n > no , 



sup 

'n:|b|ir>2 


P' 

'liPollr 


^(1 


-4>n) 


< 

peVn 


sup 

,l!p||r>2| 


|Po||r 


pni 


<2e-^"^n. 







(33) < sup P"<^||p„-P"^„||,> 
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For those p £Vn for which \\p\\r < 2||po||r, we apply (28) with p = po and use 
as well Up — Pollr ^ M6n to obtain that for all L > there exists M large 
enough such that 

sup P'^il-M 

P&Vn ■■ |b||r<2||po|lr,||p-Po||r>A/5„ 

< sup P"{||Pn-i^"Pn||r. 

P&Vn: ||p||r<2||pol|r,||p-Pol|r>A/5n 

(34) 

>{M-Mo-C{K))Sn} 

We conclude from (32) and (33) that for any L > there exists n^ < oo such 
that 

(35) sup P"(l-</)„)<2e-^"^'. 

P&Vn : ||p-po||r>A/5„ 

Now (31) and (35) prove (30) if r > 1. If r = 1 the above case distinction 
is not necessary as ||p||i = 1 always holds, so that the proof of the second 
case applies with the full supremum over {p £Vn'- \\p — Polli ^ M6n}- This 
completes the proof of Theorem 2. 

To prove Theorem 3 we argue similarly, and only have to slightly modify 
the derivation of the error probabilities of the tests: when it is known that 
the posterior concentrates on a fixed sup- norm ball of radius B, then we 
can restrict the alternatives in (30) further to densities bounded by B, and, 
using (28) with p = po and the present choice of 6n, we also obtain 

sup P"(l-</>n) 

peVn ■■ ||p||oo<-B,||p-Po||r>A/5„ 

< sup P^{\\pn-P''Pn\\r>{M-Mo-C{K))6n} 

pf^Vn ■■ ||p||oo<S,||p-P0l|r>Af<5„ 

<2e~-^"^". 
4. Remaining proofs. 

4.1. Proofs of Propositions 1, 2 and 3. 

Proof of Proposition 1. Since ||C/q||oo < C" almost surely for some 
fixed constant C = C{B,a,ilj), we infer \\p^'°'\\a,r,oo ^ D{B,a,i/:) almost 
surely for 1 < r < oo. In particular the prior is supported in a ball of bounded 
densities, hence so is the posterior, and we can attempt to apply Theorems 2 
(for r = l,oo) and 3 for (1 < r < oo), which we shall do with the choice 
e„ = (n/logn)-"/(2°+i). 

We verify the small ball estimate in the second condition in Theorem 2. 
By Lemma 3.1 in [32] we can lower bound the prior probability in question 
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by Pr{||logpo — t^olloo < ce„} for some constant c > 0. Since 



oo 



< 



C((^,-^)max( sup|afc(/t)|,y^sup2^/^|/3£fc(/t)| 



V fc n k 



for any continuous function h on [0,1] and some constant C{(j),'ilj), we can 
lower bound the last probability, writing a^ , /3^fc for the wavelet coefficients 
of logpo, by 

Prjmaxf sup ja^ - uofc|, Vsup2^/2|^,fc - 2-^("+V2)^^^A < ^/^ 1 
L \k=o,...,N e ^ /J 

= Pr|max|afc - uok\ < c'sn} Pr| V max2^/2|/3^fc - 2-^(°+i/2)^^^| < ^/^^l 

where A^, Jo depend only on the wavelet basis (see before Definition 1). 
Since \ak\ < B and since the uok are U{—B, B), the first probability exceeds 
(c'e„/2S)^+i = e-(^+i)i°g(2^/'='^") which is bounded below by e-'=i°g(i/^") 
for some c> that depends only on B, a and the wavelet basis. For the sec- 
ond probability set bgk = 2^("+i/2)/3^^,^ > j^, and MiJ) = ZLjo E?="o' 1 < 
2 • 2"^, and note that \bik\ < IllogpolU.oo < -B. Choosing J = Jn> Jo large 
enough and of order e„ ~ 2"*^", this probability is bounded below by 

PrJ Yl 2"'" sup|6,fc -Uik\< c'sn - C(^, 5)2 
U=Jo ^ 



-Ja 



> Pri maxmax|6^fc -Uik\< e'en \ 

I ll<J k<2l J 






A/(J) 



>e 



-c"'log(l/e„)/£y" 



for n large enough and some c"' > that depends only on i?, a and the 
wavelet basis. Summarizing we have, by definition of e„, that the 11" prob- 
ability in condition (2) of Theorem 2 is bounded from below by 

(36) Pr{||logpo - UaWoo < CEn} > e-^l°s(V^n)g-c"'log{l/e„)/.y'^ > ^-Cnel 

for some C that depends only on B, a and the wavelet basis, which proves 
that condition (2) holds. 

We next verify the bias condition with Vn = supp(n) so that 11(7' \Vn) = 
0. We bound the L''-norm of the approximation errors of any element in Vn 
by a constant times 5„, where we take 7^ equal to logn to a sufficiently 
large power chosen below. Since 2"^" > cne^ > cn^' ^^o+i) ^g have, using 
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50i([0,l])cL'^([0,l])andpGC"([0,l]), 

oo / 2i \ l/'' oo 

\\KjM -VWr <cY^ 2^(1/2-1A-) ^ |^^^(p)|r < ,'(5,,) ^ 2"^", 

£=j„ Vfe=i / ^=j„ 

which is 0(en), so the bias condition is satisfied for some C{K) large enough, 
both for Vn, as well as for po- 

Finally condition (c) from Theorem 2 and (a), (b) from Theorem 3, as well 
as 6n — >• 0, are verified for this choice of e„ and under the conditions on a, r, 
except for the cases a = or a = 1/2, r = oo, where the result trivially follows 
from 6n being bounded from below by a constant multiple of logn (and as 
the prior is supported in a ^''-bounded set). D 

Proof of Proposition 2. We apply Theorem 2 with r = oo. We have 
from the proof of Theorem 5.1 in [13] that for e.„ = (logn)'^/^/n,K > 1, the 
small-ball estimate in condition (2) of Theorem 2 is satisfied. Choose 7^ in 
such a way that dn equals {logn)^/^/n where r] > k. For the bias, we take Vn 
to be the support of 11 and consider a Meyer-wavelet basis and the wavelet 
projection onto it, with 2'^" = c(logn)^'^, where c is a large enough constant 
that depends on mf{a:a G supp(G)}, and apply Proposition 4 in [25] with 
s = 2 and suitable cq, to see that \\Kj^{pF,a) —Pf,ct\\oo = o{l/n) uniformly in 
the support of 11. A more detailed proof is in the supplementary file [18]. D 

Proof of Proposition 3. Taking e„ = M'(n/ log n)~°/(^"+^), and 

noting En = O(ne^), we can take J„ such that 2^" < 2-^" < cne^ for ev- 
ery n, some c> 0. Taking K{x, y) equal to the Haar wavelet projection ker- 
nel (CDV-wavelet of regularity 5 = 0), we conclude that \\Kj^{p) — p\\r = 
Ilj^-a.s. Vn, so condition (1) in Theorem 2 is satisfied with Vn equal to 
the support of IIj^. The small ball estimate (2) follows, as in the proof of 
Theorem 1 ( [29] , pages 636 and 637, with k^ = 2-'" , and approximating po 
by Kj^{pq) s.t. \\Kj^{pq) — polli < en/2 for M' large enough), and from the 
second inequality in (36). The bias condition for po is satisfied by standard 
approximation properties of Haar wavelets. The result now follows from first 
applying Theorem 2 with r = 1 , 00 and then using the conclusion that the 
posterior concentrates on a || • ||oo neighborhood of po to invoke Theorem 3 
for the cases 1 < r < 00. D 

4.2. Proof of Proposition 4- We shall construct subsets of V on which 
we can control the approximation errors from (15). We define Holder spa- 
ces. For a,r > positive real numbers, define the norm ||/||a,oo,r '■= 
Etlo\\f^''^\\oo + H{a,T,f) where 

r^, ^. ^^Vh:\h\<t,x+h(i%l]^^Vxe%l]\f^^\x + h)- f'^^\x)\ 

H{a,T,f)= sup -T^y- — — — , 

o<t<i n°'i{\ogt ^y 
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and where we take ||/^^||oo = oo if f^ ' does not exist. Define, moreover, 
C"'°°'^([0, 1]) := {/ : [0, 1] ^ M : ||/|U,oo,r < oo}. The case r = speciahses to 
the strict a-Holder case C"([0,1]). 

In case 1 < r < oo, we shall use approximation theoretic properties of the 
reproducing kernel Hilbert spaces (RKHSs) of Ba,Ba, which are Sobolev 
spaces. Recall that the RKHS H(l/2) of Brownian motion on [0, 1] is the 
space of absolutely continuous functions that are zero at zero and whose first 
derivatives are in L^([0,1]), equipped with the inner product (/, g)H(i/2) = 
Jq f'g'. Then, the RKHS of integrated Brownian motion B^ is 

M(a) = |/ /" ■■■ r f{s)dsdti---dt^^^_^:f(^m{l/2] 
Uo JQ Jo 

with inner product {f,g)M{a) = f^ f(i''^+^)g(i''^+^) . Finally, / G ]HI(q), the 
RKHS of Ba, iff / = P\a] + 9 where Pa is a polynomial of degree [a] and 
g e H(a) , and note that P[„] (t) = X^j^'g /(*) (O)tVz!; the inner product in m{a) 
is (/, 5>fi(„) = EHo f^'^ (0)5^*) (0) + £ /( W+i)^(H+i) ; see, for example, [33] . 
The spaces M.{a) are precisely the Sobolev spaces iJ°"^^'^, and other equiv- 
alent norms may be used below. 

We will also require the following definition. For a S-valued Gaussian 
random vector W , B a. Banach space, and for w £ B, the "concentration 
function" (p^ (e) of W at w is defined as 

(37) e-^^^^'^ =FT{\\W-w\\<e}. 

The following result is a consequence of Borell's isoperimetric inequal- 
ity [4], and is essentially contained in the proof of Theorem 2.1 in [32]. 

Propositions. Let a £ {n—l/2:n G'N}, denote by Mi (a) the unit ball 

ofM{a) and let B^ = {f e C{[0, 1]) : ||/||oo < !}• Let En satisfy <^f" (e„) < nel 
for all n. Then the released integrated Brownian motion process B^ has a ver- 
sion, that we continue denoting by B^, such that for every C > 0, Z? > 0, 

Vi{Ba i MrMi{a) + EnB^] < Z)e"(^+^)""' , 

where Mn = Mn{C,D) = -2<I)-i(De~('^+^)"^") ~ y/nSn and $ is the stan- 
dard normal distribution function. 

Proof. Borell's inequality (e.g.. Theorem 4.3.3 in [3]) implies 

(38) Pr{5„ i MrMi{a) + EnB^} < 1 - <5(a„ + M„), 

— 2 

where a„ solves the equation <^(a„) = Pr{||i3Q,||oo < £n} > e~"^". It then 
follows {C + A> 1) that o„ > —Mn/2, which implies 

1 - $(a„ + Mn) < $(-M„/2) = De-(^+^)"^" . D 
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In particular, taking D = Pr{||^a||oo < c} for any c > 0, this proposition 
gives 

(39) Pr{5„ ^ Af„Mi(a) + enB^\\\Ba\\oc < c} < e-(^+4)"^" 

with Mn depending on C and c, and of the order \fnEn- 

In case r = oo we need a different result that reflects the almost sure 
Holder regularity of the trajectories of B^- 

Proposition 6. For all a £ {n — 1/2 : n £ N}, integrated Brownian mo- 
tion has a version, that we continue denoting by B^, with almost all its 
sample paths in C"^'°°'^' ^([0, 1]) and for every D >0 there exist ta < oo and 
La < oo such that 

(40) PT{\\Bja,oo,i/2>t}<De-^-'\ t>ta. 

The same is true for the processes B^ = ^^^=0 ^ki /k\ + B^, that is, 

(41) M\\Ba\\a,oo,l/2>t}<De~^'^'\ t>ta, 

for possibly different La{D) and ta{D), for all D > 0. 

Proof. By a classical result of Levy (see also Theorem IV. 5 in [7]) 
Brownian motion i?i/2 has a version in C"^'^'°°'^'^([0, 1]). Since, for a > 1, 
by the definitions, 

ll"a||a,oo,l/2 = ||-Oq:||oo + l|-DalU-l,oo,l/2 = ||-Oa||oo + ||^Q-1 ||a-l,oo,l/2) 

and ||-Ba||oo < co a-S-, induction extends the result to all a £ {n — 1/2 :n S 
N}. 

For < a < 1, Theorem III. 6 in [7] shows that the norms ||/|U,oo,i/2 and 

ooo 1/2 ^^^ equivalent, where ||/||q,oo 1/2 is defined as 



(42) 



with 



:sup{|yi(|,|y{|,max-|^|y/^j} 



(43) y( = 3-'/'if{l)-fm, 



/f^V^f/fi-V/P 



V 2^+1 J 2\ \2i J \ 2i 

for fc = 1, . . . , 2-', j = 0, 1, Obviously, || • ||^ ^ ^i^ is a supremum norm 

on a sequence space; more specifically, it is the sup of the absolute values 
of a countable number of linear functionals on the space C°'°°'^'^([0, 1]) 
(linear combinations of point evaluations). Hence Lemma 3.1 and inequal- 
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ity (3.2) in [22] (this last inequality even with 7r^/2 replaced by 2) apply 
to ||-Ba||a,oo,i/2i giving (40) for D = \. For D <\, take t'^ > ta such that 
D > e~(^"/^)(*") and L'„ = Lq/2. If a > 1, then the result follows by apply- 
ing these inequalities to the C'^'^J'°°'^''^-norm of the [a]th derivative of the 
process and to the sup norms of the process and of its derivatives of order 
smaller than [a]. Since (40) is obviously true for the processes Z^t'', it is 
true as well for B^ possibly with a different constant, which gives (41). D 

Again, taking D = Pr{||i?Q,||oo < c}, for any c > 0, this proposition gives 
(44) Pr{p„|U,oo,i/2>t|Pa||oo<c}<e-^'^*', t > t<„ 

La and ta depending on c. 

These two consequences of Borell's inequality imply that the integrated 
Brownian motions concentrate on suitable subsets of C([0, 1]), and the fol- 
lowing lemma achieves the same for the normalized trajectories of the pro- 
cesses e^°'^^'^\ 

Lemma 2. Let a£ {n — 1/2 :n G N}, and let Kj he a CDV-projection 
kernel of regularity a + 1/2, at resolution j > 0. 

(1) (Case l<r<oo.)Letfe {M„EIi(a) +e„Bi, ||/||oo < c}, where Mi(a) 
is the unit ball of the RKHS of Ba and set p = e^ / j^ e^ . Then, for f = 
max(2, r) and some C > 0, 

\\K,{p)-p\\r < C(M„2-^'(-+i/f) +£„). 

(2) (Case r = oo.) Let f satisfy \\f\\oo < c and ||/||a,oo,i/2 < Ly/n£n, and 
let p he as ahove. Then, for some C > 0, 



\Kj{p)-p\\oo<CV^en2-^''y^j 



Proof. We first consider 1 < r < cxd. Since ||/||oo < c we have e '^ < 
/o &^ < e"^ so, f Kj{x,y){-){y)dy being a linear operator, it suffices to bound 
\\Kj{ef) -ef\\r. Writing f = f^ + f^ with /i E M„Mi(a) and /s G enB\ we 
see that II/2II00 ^ £n < c, ||/i||oo < c + e„ < 2c, and in particular, [c'^^^^-' — 
g/2(y)| < e'^\f2{x) — /2(y)|- Note also that, for some constant C{K) < 00, 
\\2-^Kj{x,x + 2-J-)lli < C{K). Then we have 

\K,{ef)-ef\{x) 

2-^Kj{x,x + 2-^n)(e(-^i+^2)(x-+2-i«) _ e(/i+/2)(x')) d^ 



< 



c 



/"2-^Kj(x,x + 2-^u)(e^i(^+2-^") - e-^i(^-)) d« 
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+ 






The L''([0, l])-norni of the second term is bounded by a fixed constant 
times En, and it remains to control the L'"([0, l])-norm of the first term in 
the bound. Note that the Sobolev space M{a) = if °+i/2 ig contained in the 

Besov space B22 ([0,1]), which itself is continuously imbedded into the 
Besov space B"^^^^~^^^^^^''{[0,1]) = B^^^^''{[0,1]); cf. Remark 1. We con- 
clude, for some constant C", that \\Kj{ef^) - ef^\\r < C"||/i||e(„)2-^(°+V*=) 
from the approximation properties of wavelet projections on Besov spaces 
(Definition 1). This establishes the bound in the first part of the lemma. 

For the case r = 00, note that, / being bounded by c, the chain rule gives 
that there exists C{c, a) such that 

(45) l|e^lU,oo,i/2 < C{c,a)m\a,oo,i/2 + 1). 

We conclude from a standard bias bound for wavelet projections that 
||i^i(eO-e^lloo<c(||/L,oo,i/2 + l)2-^"V7 which, in view of e-^<£e/<e^ 
gives the overall inequality. D 

The choice j = Jn with 2 " ~ ne^, relevant in Theorems 2 and 3, gives, 
for p satisfying the hypotheses of the previous proposition, the bounds 

(46) \\KjSp)-v\\r<C{{nel)-'' + en) for 1 < r < 2 
and 

(47) ||i^j„(p)-p||,<C(V^e„(n4)-(°+i/'^)+e„) for 2 < r < 00 
as well as 



(48) \\Kj^{p)-p\\^ < CV^Eninelr'^^/loginel). 

The last auxiliary fact that we will require about B^ is a small ball 
probability estimate, concretely an upper bound for the concentration func- 
tion (j)^°' (e) as e approaches zero. 

Proposition 7. Let Ba,a £ {n — 1/2 -.n £ N} be integrated Brownian 
motion, considered as a Gaussian vector taking values in the Banach spa- 
ce C([0,1]), and let w £ C" {[0,1]). Then, (/>^"(e) = 0(e-i/"), and the same 
is true for cp^" if we further assume vu^^\0) = 0, k <[a\. 

Proof. Since B^ = W201 in [24] and it also equals a constant times Ra 
in [32], this proposition simply combines Theorem 2.1 in [24] and Theo- 
rem 4.3 in [32]. D 
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This result applies to the "conditional" concentration function: if ||u;o||oo < 
c/2 and e < c/2, then 

PrJII^Q -Wolloo <£|||-Ba||oo <c} 

_ Prlll^Q, - Wolloo < £, \\Ba\\oc, < c} 



(49) 



M\\Ba\\oo<c} 



4.B 



'/'"^(e) 



Pr{||S«||oo<c}' 

We are now in a position to apply Theorems 2 and 3 to prove Propo- 
sition 4. To ease notation define I{'w) = e^ / f^ e^^^' dt,w G C([0,1]), and 
record that, for ||ii;||oo < c, 

(50) \I{w)\<L{\w\ + l), 

where L depends only on c. 

Set wq = logpo) so that, since ||wo||oo < c/2 and po is a density, hence 
Po = I{wo), Lemma 3.1 in [32] gives that li p = K^w) for w = Ba{uj) for some 
uj gQ, and \\w\\oo < c, then -Polog^ < R\\w - wqWI, and Po(log^)^ < 

i?||w — -u^oIlL for some R < oo (that depends on c). Hence, for any e > 
such that R~^/^e < c/2, 

uLer:-Polog^<e\Po(log^) <e^ 



Po \ Po 

(51) 

> Pr{p, - woWoo < R^^/^IWBJ^ < c}. 

Since wq is in C"([0, 1]), it follows from Proposition 7 that cp^^ (e) = 0(e~^/") 
as e — ;• 0, say, there exist ci large enough and ei > such that 

(A£o (^) ^ cie"^/" for ah e < ei . 

Then we have, for e„ = (ci/?i)"'^^"^^', from some n on, both 

</<^„" (i?- V2e„) < cii2V{2a),-i/a and 0^„" (e„) < nel 

Hence, for these n, by (49), 

(52) Pr{p« - u;o||oo < R~^^^en\\\Bjoc < c} > e"^"^", 

where C = cii?^' '^"^ . This proves condition (2) in Theorems 2, 3 for these C, e„ 
To proceed with the verification of the conditions of Theorem 2, take Vn = 
{ Iiw):w g {M „Mi(a) + e„Bi}} if r < oo and P„ = {I{w) : ||u;|U,oo,i/2 < 
Y^(C + 4) / Lay/n£n} if r = OO, and note that condition (1) in Theorem 2 is 
satisfied for these choices in view of Propositions 5 and 6; see (39) and (44). 
The bias condition is satisfied for the above choice of e„, 7^ = 1 if r < 00 
and 7rj, = \/logn if r = 00, in view of Lemma 2; cf. also (46), (47), (48). 
Finally the additional restrictions on e.„ in Theorems 2 and 3 are also sat- 
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isfied, unless a = 1/2, r = oo. In this case the rate of contraction 6n exceeds 
a constant multiple times -^logn, so that the result follows trivially from 
the fact that the prior is supported in a sup-norm bounded set. 

4.3. Proof of Theorem 1. Observing Y^^> is equivalent to observing its 
action, on the basis, 



(53) 



yk= f Mt)dY^^\t) = {f,(t^k) + ^ [ Mt)dB{t) 
Jo y/n Jo 

■. = ek + ^gk, k = 0,...,N-l, 
In 



Jo 

1 /■! 



%k + ^gik, k = 0,...,2^-l,i>Jo, 



n 



with the variables g^., gik ah i.i.d. A^(0, 1). The observed process, still de- 
noted by y("', can thus be viewed as a random element Y^^> = {ykjUikY 
of ^21 where yk is N{9k,l/n), and yik is N{6ikA/'^)-, all independent. Like- 
wise the function /o to be estimated becomes the vector 9q = (6*^, 0^^.)* of the 
coefficients of its wavelet expansion, that is, 9^ = (/o, 4>k) and 0^^ = (/o, (t>(.k), 
and any prior 11 on L2 maps onto a prior, still denoted by IT, on the param- 
eter space 9 = {9k,9ikY £ £2- 

The posterior n(-|y(")) is then the law of 9 given the observed pro- 
cess Y^'^' . Standard results on Gaussian measures on £^ imply that if the 
prior n on £2 is a centered Gaussian vector of trace class covariance S, then 
the posterior probability law given Y^"-' , H^ = H^ " , is also Gaussian, with 
mean §{Y) = Eui9\Y(''^) = S(S + //n)-iy(") = S(S + I/ny\yk;yikY and 
with covariance Sly*^"^ = S(nS + /)~^; see, for example. Theorem 3.2 in [35]. 

We will drop the superindex (n) from the processes y(") and Yq from now 
on to expedite notation. 

The posterior 11^ gives rise to a Gaussian measure on L2([0, 1]) by simply 
"undoing" the isometry, that is, by taking the law of the random wavelet 
series in L^([0, 1]) with coefficients drawn from 11^ equal to 



^ = E 



JV-lr -, / -, N 1/2 



fc=0 



l + l/n" \n+l 



Vk + ( — — r I 9k 



00 2*-lr / N 1/2 

yik + — r 5£fe 



+ EE 

i=Jo k=0 



^£ + 1/n \n^i + 1 



(pk 

\i/2 1 

Tplk 
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EuAf\y)+Yl 



k=0 



1 



n+1 



'fcfl'fc 



oo 2^-1 

+ EE 



/J.<? 



1/2 



e=jo k=o ^ ^'^ ' - 

where the g variables are i.i.d. A^(0, 1), and y^, yik are, as defined above, the 
integrals of the wavelet basis functions with respect to dY{t). Under dYQ^t) = 
/o(t) dt + dB{t)/^/n, we have yu = (/o, (l)k) + gk/V^, Vek = (/o, M + Oik/V^, 
where the gk,9ik are again i.i.d. A^(0, 1), independent of the variables g. So, 
the posterior given Yq integrates the g variables, and Eyq integrates the g 
variables, and we have 

EYonlH\\f-fo\\oo>Men} 

N-lr -, / 

-{jo,<Pk) 



Pr< 



fc=0 



1 + 1/n 



+ 



1 



Vn{l + 1/n) 

oo 2*-l 



gk + 



1 



n+l 



1/2 



9k 



(55) 



£=Jo k=0 



-1/n 



+ 



Hi + 1/n 



{k,^ik) 

9lk 



^yn{fie + 1/n) 



+ 



^J■e 



n^i + 1 



1/2 



9ek 



^. 



ek 



> MSr 



Pr{||R („)(i?nJ/|lo) - /o) + G|U > Me^}, 



where G is the centered Gaussian process 

7V-1 



G{t) = E 



fc=0 



1 



Vn(l + 1/n) 

oo 2*^-1 

EE 



£=Jo k=0 



9k + 



fJ-i 



1 



n+1 



1/2 



9k 



Mt) 



and 



^/n{f^l + 1/n) 



Af-l 



9ek + 



1/n 



A^^ 



n/x^ + 1 



1/2 



5fA: 



i^ekit) 



EY,{EuAf\Yo) - /o) = E ITIM^^O''^^^'^^ 



fc=0 



oo 2^-1 



+ EE 

e=jo k=o 



-1/n 
fii + 1/n 



{fo,Ak)iPik- 
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It suffices to prove tlie tfieorem for r = oo. We will apply Borell's [4] inequal- 
ity (a consequence thereof, in fact, equation (3.2) in [22], page 57) to the 
probability in (55), and for this we need to estimate ||i?(£'n„(/|^o) — /o)||oo, 
E\\G\\oo and \\E{G^{-m^. 

Choose Jn > Jo such that 2-^" ~ {n/logn)^/^'^°'+^\ Since /o G C°([0,1]) 
and llX^fclV'ftllloo < C2^/^, we obtain 



N~i _^, 

1 + \ n 
fc=o ' 



Ok)(Pk 



< 


Af-1 


oo 


fc=0 



C ^Ci 



n + 1 n 



and 



oo 2*-l 

EE 

£=Jo fc=0 



-1/n 



m + 1/"- 





oo 


|2'^-1 II 


{h,'4^t,k)i'i,k 


^ I^ I^ l^^^^l 




oo ^=J() 


1 fc=0 II 






■^'1 2^^'-' 



<c" V^+ V T 

\ '^-^ nan '^-^ 

'1 \ a/(2a+l) 

log n \ " ^ 



■to 



<C2 



n 



where Ci and C2 depend only on the wavelet basis, a and ||/o||a,oo- Collecting 
the last two sets of inequalities yields the bound 

/, ^ a/{2a+l) 

(56) ||i^yo(^n„(/|yo)-/o)|loo<^if "^"^ 



n 



for some Ci < 00. To bound -EUGHoo, recall that for any sequence of centered 
normal random variables Z, 



]i 



(57) 



E max |Z,| <C\/bWVmax(£;z2)V2 

^<j<N ■' j<N ^ 



where C is a universal constant. Therefore, from the definitions of Jn-,l^i, 

1 / 1 Nl/2 



E 



E 



< 



and, using /i^ < n ^ for .£ > J„, 

00 2^-1 

^EE 



V^(l + 1/n) 

k 
'n 



fJ-e 



9k + 



n+1 



9k 



4>k 



1 1,^,1 

ri(l + 1/n)^ n-|- 1 / fc 



00 
1/2 



e=jQ fe=o 



Vn(/Z£ + 1/n) 



9£k + 



fJ-e 



n/i£ + 1 



1/2 



S'^fc 



V-ft' 
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^,2 „, \l/2 

2 ' Em.ax\gik\\ —, 

h^oi \ n.liiii A 

i=Jo 



;^ k<2t \n{ne + l/ny nfxi + lj 

n(^£ + l/n)2 nfii + l^ 



oo / 2 \ 1/2 

i=Jo 

(■^" lolp , , 



n I \ n 



1 \ a/(2a+l) 

log n^ '^ ' 



n 



Conclude 

(58) ^||G||oo<C2 

for some C2 < oo. Finally, 

^"V 1 1 \ 
EG'^it) = V — — TT + (Plit) 

00 2^^-! / 2 \ 

(59) +V V( , ^^, ,o +-^]i'l{t) 

i + f_ + 2--^"(2"+l))<C3 — . 

n n ) n 

So, setting e^ = (n/logn)~"/(^"+^), the estimates (56), (58) and (59) to- 
gether with inequality (3.2) on page 57 of [22], give 

Pr{||i5;y„(^n„(/|>^o) - /o) + GIloo > Afen} 

< Pr{||G||oo - E\\G\U > Men - \\E{EnM\Yo) - /o)IL " ^l|G||oo} 

(60) _ _ 

< Pr{||G||oo - ^IIGIloo > (M - Ci - C72)e„} 



Collecting (55) and (60) and taking into account that e^ ~ 2'^" J^/n com- 
pletes the proof. 
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SUPPLEMENTARY MATERIAL 

Supplement to "Rates of contraction for posterior distributions in L^- 
metrics, 1 < r < oo" (DOL 10.1214/11-AOS924SUPP; .pdf). This supple- 
ment contains a detailed proof of Lemma 1 and an expanded proof of Propo- 
sition 2 from the mentioned article. 

REFERENCES 

[1] Barron, A., Schervish, M. J. and Wasserman, L. (1999). The consistency of 

posterior distributions in nonparametric problems. Ann. Statist, 27 536-561. 

MR1714718 
[2] BiRGE, L. (1983). Approximation dans les espaces metriques et theorie de 

I'estimation. Z. Wahrsch. Verw. Gebiete 65 181-237. MR0722129 
[3] BOGACHEV, V. I. (1998). Gaussian Measures. Mathematical Surveys and Monographs 

62. Amer. Math. Soc, Providence, RI. MR1642391 
[4] BORELL, C (1975). The Brunn-Minkowski inequality in Gauss space. Invent. Math. 

30 207-216. MR0399402 
[5] BouSQUET, O. (2003). Concentration inequaUties for sub-additive functions using 

the entropy method. In Stochastic Inequalities and Applications. Progress in 

Probability 56 213-247. Birkhauser, Basel. MR2073435 
[6] Castillo, I. (2011). A semiparametric Bernstein-von Mises theorem for Gaussian 

process priors. Probab. Theory Related Fields. To appear. 
[7] CiESiELSKi, Z., Kerkyacharian, G. and RoYNETTE, B. (1993). Quelques espaces 

fonctionnels associes a des processus gaussiens. Studia Math. 107 171-204. 

MR1244574 
[8] Cohen, A., Daubechies, I. and Vial, P. (1993). Wavelets on the interval and fast 

wavelet transforms. Appl. Comput. Harmon. Anal. 1 54-81. MR1256527 
[9] Ghosal, S., Ghosh, J. K. and Ramamoorthi, R. V. (1999). Posterior consistency 

of Dirichlet mixtures in density estimation. Ann. Statist. 27 143-158. MR1701105 
[10] Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates 

of posterior distributions. Ann. Statist. 28 500-531. MR1790007 
[11] Ghosal, S. and van der Vaart, A. (2007). Convergence rates of posterior distri- 
butions for non-i.i.d. observations. Ann. Statist. 35 192-223. MR2332274 
[12] Ghosal, S. and van der Vaart, A. (2007). Posterior convergence rates of Dirichlet 

mixtures at smooth densities. Ann. Statist. 35 697-723. MR2336864 
[13] Ghosal, S. and van der Vaart, A. W. (2001). Entropies and rates of convergence 

for maximum likehhood and Bayes estimation for mixtures of normal densities. 

Ann. Statist. 29 1233-1263. MR1873329 



L^ AND UNIFORM CONSISTENCY OF BAYES ESTIMATES 29 

[14] GiNE, E. and Guillou, A. (2002). Rates of strong uniform consistency for multi- 
variate kernel density estimators. Ann. Inst. Henri Poincare Probab. Stat. 38 
907-921. MR1955344 

[15] GiNE, E. and Mason, D. (2007). On local ^/-statistic processes and the estimation 
of densities of functions of several sample variables. Ann. Statist. 35 1105-1145. 

[16] GiNE, E. and Nickl, R. (2008). Adaptation on the space of finite signed measures. 
Math. Methods Statist. 17 113-122. MR2429123 

[17] GiNE, E. and Nickl, R. (2009). Uniform limit theorems for wavelet density estima- 
tors. Ann. Probab. 37 1605-1646. MR2546757 

[18] GiNE, E. and Nickl, R. (2011). Supplement to "Rates of contraction for posterior 
distributions in L''-metrics, 1 < r < oo." DOI:10.1214/11-AOS924SUPP. 

[19] Hardle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1998). 
Wavelets, Approximation, and Statistical Applications. Lecture Notes m Statis- 
tics 129. Springer, New York. MR1618204 

[20] Ingster, Y. I. (1993). Asymptotically minimax hypothesis testing for nonparametric 
alternatives. I. Math. Methods Statist. 2 85-114. MR1257978 

[21] Le Cam, L. (1986). Asymptotic Methods in Statistical Decision Theory. Springer, 
New York. MR0856411 

[22] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry 
and Processes. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [Results 
in Mathematics and Related Areas (3)] 23. Springer, Berlin. MR1102015 

[23] Lenk, p. J. (1991). Towards a practicable Bayesian nonparametric density estimator. 
Biometrika 78 531-543. MR113Q921 

[24] Li, W. V. and Linde, W. (1998). Existence of smaU baU constants for frac- 
tional Brownian motions. C. R. Acad. Sci. Pans Ser. I Math. 326 1329-1334. 
MR1649147 

[25] LOUNICI, K. and Nickl, R. (2011). Global uniform risk bounds for wavelet decon- 
volution estimators. Ann. Statist. 39 201-231. MR2797844 

[26] Meyer, Y. (1992). Wavelets and Operators. Cambridge Studies in Advanced Mathe- 
matics 37. Cambridge Univ. Press, Cambridge. MR1228209 

[27] Nickl, R. (2007). Donsker-type theorems for nonparametric maximum likelihood 
estimators. Probab. Theory Related Fields 138 411-449. MR2299714 

[28] Schwartz, L. (1965). On Bayes procedures. Z. Wahrsch. Verw. Gebiete 4 10-26. 
MR0184378 

[29] SCRICCIOLO, C. (2007). On rates of convergence for Bayesian density estimation. 
Scand. J. Statist. 34 626-642. MR2368802 

[30] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. 
Ann. Statist. 29 687-714. MR1865337 

[31] Talagrand, M. (1996). New concentration inequalities in product spaces. Invent. 
Math. 126 505-563. MR1419006 

[32] VAN DER Vaart, A. W. and van Zanten, J. H. (2008). Rates of contraction of 
posterior distributions based on Gaussian process priors. Ann. Statist. 36 1435- 
1463. MR2418663 

[33] VAN DER Vaart, A. W. and van Zanten, J. H. (2008). Reproducing kernel Hilbert 
spaces of Gaussian priors. In Pushing the Limits of Contemporary Statistics: 
Contributions m Honor of Jayanta K. Chosh. Inst. Math. Stat. Collect. 3 200- 
222. IMS, Beachwood, OH. MR2459226 

[34] VAN DER Vaart, A. W. and van Zanten, J. H. (2009). Adaptive Bayesian esti- 
mation using a Gaussian random field with inverse gamma bandwidth. Ann. 
Statist. 37 2655-2675. MR2541442 



30 



E. GINE AND R. NICKL 



[35] Zhao, L. H. (2000). Bayesian aspects of some nonparametric problems. Ann. Statist. 
28 532-552. MR1790008 



Department of Mathematics 

University of Connecticut 

Storrs, Connecticut 06269-3009 

USA 

E-MAIL: ginc@math.uconn.cdu 



Statistical Laboratory 
Department of Pure Mathematics 

and Mathematical Statistics 
University of Cambridge 

WiLBERFORCE ROAD 

CBS OWB, Cambridge 

United Kingdom 

E-mail: r.nickliastatslab. cam. ac.uk 



